01/30/2004 17:45 FAX 512 338 6301 



Zagorin OBrien & Graham -> USPTO-Central ©009/018 



PATENT 



AMENDMENTS TO THE CLAIMS 
Please amend the claims as indicated in the following listing of all claims: 




1 . (original) A Wthod of executing a single instruction parallel multiply-add function on 
a processor, the method\comprising: 

providing the processor with an opcode indicating a parallel multiply- add instruction; 
providing the processor with a first, a second and a third value, wherein each of the 

values comprises two or more operand components; 
multiplying first operand components of the first and the second values to generate a first 

intermediate value; 

multiplying second operand components of the first and the second values to generate a 

second intermediate value; 
adding a first operand dbmponent of the third value to the first intermediate value to 

generate a first rasult value; 
adding a second operand Component of the third value to the second intermediate value to 

generate a second result value; 
storing the first result valufe in a first portion of a result location; and 
storing the second result value in a second portion of the result location. 

2. (original) The method oftelaim 1, wherein the first, second and third values are stored 
in respective source registers of the jWessor specified by the parallel multiply-add instruction, 
and the first and the second result valfces are stored in a destination register of the processor 
specified by the parallel multiply-add Instruction. 

3. (original) The method of claftn 2, the first result value is stored in the high-order bits 
of the destination register and the second\result value is stored in the low-order bits of the 
destination register. 

4. (original) The method of claim 1 wherein the processor is pipelined and the single 
instruction is executed with a throughput of We instruction every 2 cycles. 
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5. (withdrawn) A method of executing a single instruction conditional pick function on a 
processor, the method comprising: 

providing the processor with ah opcode indicating a conditional pick instruction; 
providing the processor with A first, a second and a third value; 
comparing the first value to k reference value; 

determining, based upon Xhf comparing, whether the first value is equal to the reference 
value; 

storing the second value ir^a result location if the first value is equal to the reference 
value; and 

storing the third value in £ result location if the first value is not equal to the reference 
value. 



6. (withdrawn) The method of claim 5, wherein the first, second and third values are 
stored in respective source registers of the processor specified by the conditional pick instruction, 
and the second and the third vklues are stored in a destination register of the processor specified 
by the conditional pick instruction. 

7. (withdrawn) The method of claim 5, wherein the processor is pipelined and the single 
instruction is executed with a throughput of one instruction per cycle. 

8. (withdrawn) A method of executing a single instruction parallel averaging function 
on a processor, the method comprising: 

providing the processor with an opcode indicating a parallel averaging instruction; 
providing the proce isor with a first and a second value, wherein each of the values 

comprises two or more operand components; 
adding first operan< components of the first and the second values to generate a first 

intermediate value; 

adding second opeiand components of the first and the second values to generate a 

second intei mediate value; 
incrementing the ft-st intermediate value by one to generate a third intermediate value; 
incrementing the second intermediate value by one to generate a fourth intermediate 

value; 
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shifting the third intermediate value to generate a first result value; 
shifting the fourth intermediate value to generate a second result value; 
storing the first result value in a first portion of a result location; and 
storing the second result value in a second portion of the result location. 

9. (withdrawn) The method of claim 8, wherein the first and the second values are 
stored in respective source registers of the processor specified by the parallel averaging 
instruction. 



10. (withdrawn) The method of claim 8, wherein the first and the second result values 
are stored in a destination register of tile processor specified by the parallel averaging instruction. 

11. (withdrawn) The method of claim 10, the first result value is stored in the high-order 
bits of the destination register and ihp second result value is stored in the low-order bits of the 
destination register. 

12. (withdrawn) The raethpd of claim 8, wherein the processor is pipelined and the 
single instruction is executed with a throughput of one instruction per cycle. 

13. (withdrawn) A method of executing a single instruction parallel shift function on a 
processor, the method comprising: 

providing the processor with an opcode indicating a parallel shift instruction; 
providing the processor with a first and a second value, wherein each of the values 

comprises two or more operand components; 
shifting the first operand component of the first value by a number of bits equal to a value 

of the first operand component of the second value to generate a first result value; 
shifting the second operand component of the first value by a number of bits equal to a 

value of the secon I operand component of the second value to generate a second 

result value; 

storing the first result value in a first portion of a result location; and 
storing the second result value in a second portion of the result location. 
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14. (withdrawn) The method or claim 13, wherein the first and the second values are 
stored in respective source registers oythe processor specified by the parallel shift instruction. 

15. (withdrawn) The method of claim 13, wherein the first and the second result values 
are stored in a destination register of the processor specified by the parallel shift instruction. 



16. (withdrawn) Themethjbd 
bits of the destination register and 
destination register. 



of claim 15, the first result value is stored in the high-order 
:he second result value is stored in the low-order bits of the 



17. (withdrawn) The method 
single instruction is executed with 



18. (currently amended) A 
a fil e r e gister , 
a first and second multiplied 



of claim 13, wherein the processor is pipelined and the 
a throughput of one instruction per cycle. 



a first and second adder paths ; and 



£ e n e ral purpose processor comprising: 



aths: 



an instruction fotoh unit; and i e coding circuitry; 

wherein the processor suppoi ts a parallel multiply-add instructio n, the parallel multiply 
add instruction executable to cause the processor to, 

in parallel, route a fii st component of a first operand and a first component of a 
second operand to the first multiplier path and a second component of the 
first operandland a second component of the second operand to the second 
multiplier th. 



in parallel route ou put of the first multiplier path and a first component of a third 



operand to t le first adder path, and output of the second multiplier path 



and a secom i component of the third operand to the second adder path, and 



store output of the 



irst adder path at a first location and output of the second 



adder path at a second location. 



19. (original) The general purpose processor of claim 18, wherein the parallel multiply- 



add instruction operates on either 
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20. (original) The gen e ral purpose processor of claim 19, wherein the results of the 
parallel multiply-add instruction are saturate 



21 . (original) The g e n e ral purpos ^processor of claim 19, wherein the processor parall e l 
multiply add instruction further provides multiple saturation modes. 

22. (currently amended) A g e n e ral puipose The processor of claim 18. comprising: 
a fil e r e gister; 

an instruction fotoh unit; and d e coding circuitry; 

wherein the processor further supports a conditional pick instruction , the conditional p ick 
instruction executorjfle to cause the processor to compare a first value to zero and 
to copy either a second value or a third value to a destination location depending 
on the compariso^ 

23. (currently amended) A general purpose The processor of claim 18. comprising: 
a fil e register; 

an instruction f e tch ufait; and d e coding circuitry; 

wherein the processor further supports a parallel averaging instruction , the parallel 
averaging instruction executable to cause the processor to average a first 
operand's first component and a second operand's first component, and, in 
parallel, to average the first operand's second component and the second 
operand's second component . 



24. (withdrawn!) A gonoral purpoao The processor of claim 18. comprising: 
a file register; 

an instruction fetch unit; and d e coding circuitry; 

wherein the processor further supports a parallel shift instructio n, the parallel shift 

instruction executable to cause the processor to logically shift a first portion of a 
first folue in accordance with a first portion of a second value, and, in parallel. 



shift 



i second portion of the first value in accordance with a second portion of the 



second value. 



-7- 



responte to restrict brt requircmcnt.doc 



Application No.; 09/640,901 



PACE 13/18 * RCVD AT 1/30/2004 6:43:43 PM [Eastern Standard Time)* SVR:USPTO-EFXRF-1/0 * DNIS:8729306 * CSID:512 338 6301 * DURATION (mm-ss):05-22 



01/30/2004 17:47 FAX 512 338 6301 Zasorirt OBrlen & Graham ■* USPTO-Central 12014/018 



PATENT 

25. (currently amended) The A general purpose) processor of claim 18 comprising: 
a file register; 

an instruction fotoh unit; and 
d e coding circuitry; 

wherein the processor further supports a parallel power instruction , the parallel power 
instruction executable to cause the processor to. 

raise a first component df a first operand to a power indicated in a first comp onent 
of a second operand and, in parallel, raise a second component of a the 
first operand tojsL power indicated in a second component of the second 
operand . 



26. (currently amended) Thk A gcnoral purpo oo processor of claim 18 comprising: 
a fil e register; 

on instruction f e tch unit; orli decoding circuitry; 
wherein the processor further supports a parallel reciprocal square root instruction , the 
parallel reciprocal /square root instruction executable to cause the processor to. 
determine a reciprocal square root of an operand's first component and, in 
parallel, determine a reciprocal square root of the operand's second 
component . 



27. (new) A computed program product encoded on one or more machine-readable 
media, the computer program faroduct comprising: 

an instruction sequence, the instruction sequence including an instance of a parallel 

multiply add instruction; 
the instance of the parallel multiply add instruction having an at least four operand 

instruction format, 
wherein execution of the parallel multiply add instruction 

causes generation of a first product from a first operand's first component and a 
secopd operand's first component, in parallel with generation of a second 
product from the first operand's second component and the second 
operand's second component, 
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-^J^ causes generation of a first sum froiy the first product and a third operand's first 

component, in parallel witheeneration of a second sum from the second 
product and the third operand's second component, and 
causes the first sum to be stored in accordance with a fourth operand's first 

component and the second sum to be stored in accordance with the fourth 
operand's second component. 

28. (new) The computer program moduct of claim 27, wherein the operands include one 
or more of a fixed point format and an integer format. 

29. (new) The computer progralm product of claim 27, wherein the first components 
correspond to the high order bits of the respective operands and the second components 
correspond to the low order bits of the respective operands. 

30. (new) An apparatus comprising: 
a plurality of registers; anc 

means for performing, in response to a single instruction instance, a parallel multiply add 
operation, the parallel multiply add operation causing generation of a first product 
and a second product in parallel, and causing generation of a first sum and second 
sum in parallel/ wherein an input value for the first sum includes the first product 
and an input value for the second sum includes the second product. 

3 1 . (new) The apparatus of claim 30, further comprising a plurality of multipliers and 

adders. 

32. (new) The apparatus of claim 30, wherein the parallel multiply add operation further 
causes storing of the firsysum in a first portion of a first of the plurality of registers and storing 
of the second sum in a second portion of the first register. 

33. (new) A method of executing an instruction instance comprising: 
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generating a first product and a second product in parallel, wherein the first product is 
from a first and second valpe and the second product is from a third and fourth 
value; and 

generating a first sum and a second sum in parallel, wherein the first sum is from the first 
product and a fifth value/and the second sum is from the second product and a 
sixth value. 



34. (new) The method of clsdkn 33, wherein the first and third values respectively are 
first and second portions of a first opferand, the second and fourth values respectively are first 
and second portions of a second operand, and the fifth and sixth values respectively are first and 
second portions of a third operand. 



35. (new) The method of claim 33 further comprising storing, in parallel, the first sum in 
a first location and the second sum in a second location. 

36. (new) The method of claim 35, wherein the first location is a first portion of a 
destination register and the second location is a second portion of the destination register. 

36. (new) The method of claim 33 wherein the instruction instance is executed by a 
pipelined processor that performs operations for the instruction instance in 2 cycles. 

37. (new) The method of claim 33 embodied as a computer program product encoded in 



one or more machine-readable 



process or 



38. (new) The 
register and the second store 



the second store location is a 



media. 



of claim 1 8, wherein the first store location is a first part of a 
Ideation is a second part of the register. 



39. (new) The processor of claim 18, wherein the first store location is a first register and 



econd register. 



40. (new) The processor of claim 18, wherein the first and second multiplier paths are 
embodied as distinct functional units. 
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41. (new) The processor of claim 18,/ wherein the first and second adder paths are 
embodied as distinct functional units. 



42. (new) The processor of claim 2$ further comprising: 
a plurality of adder paths; and 
a plurality of shifter paths; 



wherein the parallel averaging i 




ction, when executed, causes the processor to, 



route the first operand's fir it component and the second operand's second 

component to a firs of the plurality of adder paths, and, in parallel, route 
the first operand's < econd component and the second operand's second 
component to a sec md of the plurality of adder paths; 

after propagation delay, roi ite output of the first adder path and a one value to a 
third of the pluralit r of adder paths, and, in parallel, route output of the 
second adder path i nd a one value to fourth of the plurality of adder paths; 

after propagation delay, ro ite output of the third adder path and a first control 
value a first of the >lurality of shifter paths, and, in parallel, route output 
of the fourth adder ?ath and a second control value to a second of the 
plurality of shifter j >aths. 
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